DNA sequencing and parametric deconvolution

نویسنده

  • Lei Li
چکیده

One of the key practices of the Human genome project is Sanger DNA sequencing. Its data analysis part is called base-calling, which attempts to reconstruct target DNA sequences from fluorescence intensities generated by sequencing machines. In this paper, we present our modeling framework of DNA sequencing, in which a base-calling scheme arises naturally. A large portion of DNA sequencing errors come from the diffusion effect in electrophoresis, and deconvolution is the tool to solve this problem. We present a new version of the parametric deconvolution which is motivated by the spike-convolution model, and some recently obtained results regarding its asymptotics. One application of the asymptotics is to look at the resolution issue from the perspective of confidence intervals. We also report on an empirical study of the progressiveness of electrophoretic diffusion by way of estimating the slowly-changing width parameter in the spike-convolution model. Furthermore, we include an example of complete preprocessing of DNA sequencing data. Running title: DNA sequencing and parametric deconvolution

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parametric deconvolution of positive spike trains

This paper describes a parametric deconvolution method (PDPS) appropriate for a particular class of signals which we call spike-convolution models. These models arise when a sparse spike train|Dirac deltas according to our mathematical treatment|is convolved with a xed point-spread function, and additive noise or measurement error is superimposed. We view deconvolution as an estimation problem,...

متن کامل

Deconvolution of sparse positive spikes

Deconvolution is usually regarded as one of the ill-posed problems in applied mathematics if no constraints on the unknowns are assumed. In this paper, we discuss the idea of welldefined statistical models being a counterpart of the notion of well-posedness. We show that constraints on the unknowns such as positivity and sparsity can go a long way towards overcoming the ill-posedness in deconvo...

متن کامل

Iterative Deconvolution for Automatic Basecalling of the Dna Electrophoresis Time Series

In DNA (deoxyribonucleic acid) sequencing, there are four possible chemical base types: adenine (A), cytosine (C), guanine (G), thymine (T), which contain genetic information. The four base types are identified by examining four DNA electrophoresis time series. This procedure is called “basecalling”. However, in practice, there are many other undesired signal features that prevent the accurate ...

متن کامل

Parametric deconvolution of positive spike

This paper describes a parametric deconvolution method (PDPS) appropriate for a particular class of signals which we call spike-convolution models. These models arise when a sparse spike train|Dirac deltas according to our mathematical treatment|is convolved with a xed point-spread function, and additive noise or measurement error is superimposed. We view deconvolution as an estimation problem,...

متن کامل

Deconvolution of Sparse Positive Spikes: Is It Ill-posed?

Deconvolution is usually regarded as one of the so called ill-posed problems of applied mathematics if no constraints on the unknowns can be assumed. In this paper, we discuss the idea of well-de ned statistical models being a counterpart of the notion of well-posedness. We show that constraints on the unknowns such as non-negativity and sparsity can help a great deal to get over the inherent i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001